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Abstract 

The root cause for confidentiality and integrity attacks against com- 
puting systems is insecure information flow. The complexity of 
modern systems poses a major challenge to secure end-to-end in- 
formation flow, ensuring that the insecurity of a single component 
does not render the entire system insecure. While information flow 
in a variety of languages and settings has been thoroughly studied 
in isolation, the problem of tracking information across component 
boundaries has been largely out of reach of the work so far. This 
is unsatisfactory because tracking information across component 
boundaries is necessary for end-to-end security. 

This paper proposes a framework for uniform tracking of in- 
formation flow through both the application and the underlying 
database. Key enabler of the uniform treatment is recent work by 
Cheney et al., which studies database manipulation via an embed- 
ded language-integrated query language (with Microsoft's LINQ 
on the backend). Because both the host language and the embed- 
ded query languages are functional F#-like languages, we are able 
to leverage information-flow enforcement for functional languages 
to obtain information-flow control for databases "for free", syn- 
ergize it with information-flow control for applications and thus 
guarantee security across application-database boundaries. We de- 
velop the formal results in the form of a security type system that 
includes a treatment of algebraic data types and pattern matching, 
and establish its soundness. On the practical side, we implement 
the framework and demonstrate its usefulness in a case study with 
a realistic movie rental database. 

Categories and Subject Descriptors D.4.6 [Operating Systems]: 
Security and Protection — information-flow controls 

Keywords end-to-end security, information flow, static analysis, 
language-integrated queries 

1. Introduction 

Increasingly, we trust interconnected software on desktops, laptops, 
tablets, and smart phones to manipulate a wide range of sensitive 
information such as medical, commercial, and location informa- 
tion. This trust can be justified only if the software is designed, 
constructed, monitored, and audited to be robust and secure. 

Securing heterogeneous systems Heterogeneity is a major road- 
block in the path of software security. Modern computing systems 
are built with a large number of components, often run on different 
platforms and written in multiple programming languages. 
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It is not surprising that systems often break at component 
boundaries. The OWASP Top 10 project identifies ten most crit- 
ical web application security risks [2|. The top of the list is dom- 
inated by attacks across component boundaries: injection attacks 
(with SQL injection as prime example) are number 1 on the list; 
cross-site scripting attacks are number 3. In both, untrusted data 
bypasses inter-component filtering, which leads executing mali- 
cious commands (commonly in SQL or JavaScript) to compromise 
confidentiality and integrity. 

In the face of complexity and heterogeneity of today's systems, 
it is vital to ensure end-to-end security 1451 , overarching compo- 
nent boundaries. 

Information-flow control The root cause for confidentiality and 
integrity attacks against computing systems is insecure information 
flow. For confidentiality, this implies a possibility of leaking infor- 
mation from sensitive sources to attacker-observable sinks. For in- 
tegrity, this implies a possibility of data from untrusted sources to 
compromise data on trusted sinks. 

Enforcing secure information flow is more involved than en- 
forcing safety properties like tracking units of measure 1 33 1 or taint 
tracking [47|. This is due to the fact that there are two different 
types of information flows. The first type of flow, the explicit flows, 
originates from the explicit propagation of values, via, e.g., param- 
eter passing. Tracking this kind of flows is similar to tracking units 
of measure or taint tracking. The second type of flows, the im- 
plicit 1 25 1 flows, corresponds to flows via the control flow. Con- 
sider 

1 = if (h) then true else false 

Depending on the value of h, either the then branch or the else 
branch of the conditional is chosen to be evaluated to give the 
final result. In the above program, this has the effect of leaking 
the Boolean value of h into I, constituting an implicit flow from h 
to I. A different machinery is needed to track this kind of flows, 
which distinguishes enforcement of secure information flow from 
enforcement of safety properties 1511 . 

A large, extensively surveyed 1131 1291 l30l 1411 , body of work 
has studied information-flow control. However, with a few recent 
exceptions (discussed in Section |6j, the problem of information 
flow for different components has largely been explored in iso- 
lation. This is unsatisfactory because tracking information across 
component boundaries is necessary for end-to-end security. 

Motivated by the above, this paper focuses on information-flow 
control for systems with database components. 

Database integration Programs commonly access databases via 
libraries that connect and interact with the database. If we take SQL 
as an example, querying is typically done by constructing a query 
string that is passed to the database as illustrated below. 

let query = "SELECT Name FROM People"; 

let result = SqlCommand (query , db) . execute () ; 

The problem with this approach is that the queries are con- 
structed at runtime without any guarantees on the query. In general 
it is hard to verify that the constructed queries are meaningful let 
alone decide information flow properties for the queries. The cre- 
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ated string could be an invalid query or even the result of an SQL 
injection. Further, the returned information is by necessity encoded 
in a generic way, which makes it both inefficient and error prone 
to work with. Instead, it is attractive to integrate database query 
mechanism into the language as facilitated, e.g, by Google's Web 
Toolkit |4 |, Ruby on Rails (TT], and Microsoft's LINQ (3). 

In functional setting, an elegant approach to provide language- 
integrated query is to use meta-programming based on quotations 
and antiquotations. This is the approach taken by Cheney et al. 1 16]. 
The goal is to provide access to SQL databases in F# (with Mi- 
crosoft's LINQ on the backend). F# provides quotation via <@ @>, 
which creates a typed representation of a given F# expression e. As- 
suming that e has type t, then <@ e @> is a value of type Expr(t). 
Antiquotes {'/, ) provide a way to splice in typed quoted val- 
ues into other quoted expressions. This approach capitalizes on the 
flexible meta-programming capabilities of F# [50|. With this frame- 
work we can express the above query in F# in the following way. 

let query = 

<@ for p in C/i db) .People do 
yield p. Name 

<3> 

let result = run query 

From the type of the spliced in database, db, the type system 
of F# is able to determine the type of query to Expr(list string). 
In turn query is given to run which, when run, executes the query 
resulting in a list of strings. The typing of the program is compile 
time, whereas the creation and execution of the actual query is run- 
time. At runtime the quoted expression is parsed by the F# runtime 
and the typed result is passed to run for normalization and evalua- 
tion. This produces and performs the actual SQL query. Note how 
antiquotation is used to splice in the database allowing the con- 
struction of multiple queries using the same database connection. 



Contributions This paper puts homogeneous meta-programming 
to work to develop information-flow type systems for heteroge- 
neous systems. In particular we present an information-flow type 
system for a subset of F# with database queries. The presented de- 
velopment is an instance of a general method that allows for the 
reuse of existing type systems to create information flow type sys- 
tems that seamlessly spans language boundaries. Thus, the method 
is not limited to database queries. 

Because both the host language and the embedded query lan- 
guages are F#-like subsets, we are able to leverage information- 
flow enforcement for functional languages to obtain information- 
flow control for databases "for free". The simplicity of the result- 
ing type system and the relatively small modifications needed is 
evidence for the success of the approach. 

In a nutshell, the paper contains the following main contribu- 
tion: 

(i) We leverage homogeneous meta-programming to provide 
information-flow security for a subset of F# including database 
access via the essence of query processing in Microsoft LINQ, as 
it is expressed in F#. 

In addition, the paper contains further contributions: 

(ii) We develop the formal results in the form of a security 
type system and show that it enforces the security condition of 
noninterference |28| (Section[2j. 

(iii) We develop an analysis to treat algebraic data types and 
pattern matching, establish its soundness, and implement it as a 
part of our prototype (Section|4](. 

(iv) We present an implementation of the type checker and a 
translator from our language to executable F# code (Section|3j. 

(v) We demonstrate the usefulness of our framework by a case 
study with a realistic movie rental database (Section|5j. 



The full soundness proof and the code of the framework and 
case study are available onlin^J 

2. Framework 

This section presents a simple functional language with support for 
product types, records, lists, quoted expressions and antiquotations, 
the security type system, and shows that the type system enforces 
information-flow security with respect to a small-step semantics. 

Recall that the fundamental idea is that, since the information- 
flow of the database interaction is fully described in the quoted 
language, the type systems is able to enforce information-flow 
security for the database interactions for free. 

2.1 Language 

The language is based on the one used by Cheney et al. 1161 with 
the addition of security levels to the type system. 

Figure [T] shows the syntax of security levels, types, and terms. 
We write x to denote a sequence of entities x. For example, / : t is 
a shorthand for a sequence /i : t\ , fi : ta, . . . , f n : in of typings 
of record fields. 

i::=t\U 

b ::= int* j string* bool* 

t::=b\t-*t\t*t \ {/ : t} | (t list)* | Expr(i) 
T ::= ({77b}) list* 
T, A::= • | T,x : t 

e ::= c | x \ op(e) | lift e | fun(x) — > e | rec f(x) — > e | (e, e) 
| fste | snd e | {J^~e} \ e.f | yield e | Q 

e @ e | for x in e do e | exists e | if e then e | run e 
«S e ®> | (7. e ) | database(cc) 



Figure 1. Syntax of language and types 

We remark on some of the interesting constructs: c denotes 
built-in constants, such as integers and booleans. op denotes built- 
in operators, such as addition and logical connectives, lift e lifts 
an expression of type t to type Expr(t) . for x in e\ do ei is used 
to express list comprehensions where x is bound successively to 
elements in e\ when evaluating ei. The results of evaluating ei 
for each element are then concatenated, run e denotes running 
a quoted expression e. This involves generating an SQL query 
based on the quoted term, e\ @ ei denotes concatenation of ei 
and ei. Section [2T2] provides further details, exists e evaluates to 
true if and only if the expression e does not evaluate to the empty 
list. This can be used to check if the result of a query is empty. 
Similarly, if e\ then ei evaluates to ei if ei evaluates to a non- 
empty list and to [] otherwise, yield e denotes a singleton list 
consisting of expression e. <@ e @> denotes a quoted expression e. 
The language allows only closed quoted terms, since this simplifies 
the semantics of the language and is still able to express all the 
desired concepts. Quoted functions can be expressed by abstracting 
in the quoted term as opposed to abstracting on the level of the host 
language. ('/, e ) denotes antiquotation of the expression e, and 
allows splicing of quoted expressions into quoted expressions in a 
type-safe way. 

Security type language The security type language is defined by 
annotating a standard type language for a functional fragment with 
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quotations with security levels £. Without loss of generality the se- 
curity levels are taken from the two-element security lattice consist- 
ing of a level L for non-confidential information and a level H for 
confidential information. Information-flow integrity policies can be 
expressed dually 1 14 1. The types are split into base types (&), which 
can occur as types of columns in tables (T), and general types (t) 
which include function types, lists, and quoted expressions. 

As is common, we consider a database to be a collection of 
tables. Each table consists of at least one named column, each 
of which is equipped with a fixed security level annotated type. 
The security levels on types for database columns express which 
columns contain confidential data and which columns do not. 

To express security policies for databases, each database is 
given a type signature. Such a type signature describes tables as 
lists of records. Each record field corresponds to a column in the 
sense that the field name matches the name of the column in the 
database. A column is specified as confidential or public by using a 
suitable type for the corresponding field in the record. The ordering 
of elements in a list used to represent table contents is irrelevant. 

To illustrate the addition of security levels to the type system in 
the case of databases, consider an example, adapted from Cheney 
et al. 1161 , involving a database of people and couples, PeopleDB. 
In this scenario, we assume that the names of people are confi- 
dential, while the age is not, which leads to the following type for 
PeopleDB. 

PeopleDB : 
{ People : 

{ Id : int"L; Name : string'H; Age : int~L } list"L 
; Couples : 

{ Personl : int~L ; Person2 : int"L } list~L 

} 

Now consider the situation where we want to query the database 
for couples where one partner is more than 10 years older than the 
other partner. This can be done by iterating once over all couples in 
the database and then iterating twice over all people in the database. 
For each couple and pair of persons, one then checks if they are 
part of the couple that is being considered and checks if the age 
difference is higher than 10. If that is the case, the name of the 
first partner along with the age difference is returned as part of the 
result, which is a list of records consisting of a name and the age 
difference. 

let db = «3 database "PeopleDB" ©> 

type ResultType = {name : string'H ; diff : int~L} 

let differences : Expr < ResultType list L > = 
<@ for c in (7, db). Couples do 
for pi in (7, db) .People do 
for p2 in ('/, db) .People do 
if (c. Personl = pi. Id) kk 
(c.Person2 = p2 . Id) kk 
(abs (pi. Age - p2.Age) > 10) then 
yield ({ name = pi. Name 

; diff = pi. Age - p2.Age }) 

©> 

let main = run differences 

As can be seen in the above program the information-flow pol- 
icy for this program is specified by giving a type annotation to the 
quoted expression that generates the query, i.e., a type annotation 
for differences. In particular, the name components of the result 
are typed confidential, while the age differences are public. This 
matches the policy specified for the database contents, in which 
the names of people are confidential while their ages are not. The 
type system ensures that the result type of differences is in fact 



compatible with the policy specified for the database. Changing the 
security annotation of the name field from secret to public as follow 
results in a type error. 

// No longer well-typed: 

type ResultType = {name : string"L ; diff : int~L} 
2.2 Operational Semantics 

We denote evaluation of an expression e using database data in Q 
to another expression e by e — >n e . Q, is a function that maps 
database names to the actual content of the database it refers to, and 
8 is a mapping that maps operators to their corresponding seman- 
tics. E maps constants and databases to their respective types. 

We assume that Q, is consistent with the typing for databases 
given in £: for each database Q(db) is assumed to be a value of 
type £(db). 

The evaluation rules in Figures [2] [3] [4] and [5] follow 1 16 1. 
Let — >q be the reflexive-transitive closure of — >q. Evalua- 
tion and normalization of the quoted language is denoted by 
evaln(norm(e)). This evaluation entails generating database 
queries that can be executed by actual database servers. In par- 
ticular, higher-order features such as nested records or function 
applications need to be evaluated to obtain computations that can 
be expressed in SQL. Figure [6] shows the syntax. The semantics is 
call-by- value with left-to-right evaluation of terms. This is formal- 
ized using evaluation contexts £. Quotation contexts Q are used to 
ensure that there are no antiquotations left of the hole. 

We denote substitution of free occurrences of a variable x in 
expression e with another expression e by e[x h-> e']. 



V::=c\ fun(x) -> e | rec f(x) ->■ e | (V, V) | {/ = V} 
[] | yield V @ ... @ yield V | «§ Q <5> 

Q ::= c | op(Q ) | lift Q \ x | fun(a;) -> Q j Q Q | (Q, Q) 
{/ = Q} I Q-f I yield Q | D I Q @ Q I for x in Q do Q 
exists Q | if Q then Q | database(dft) 

£ ::= [ ] | op(V ,e,M) lift £ | £ e \ V£ \ {£, e) | (V,£) 

{J^V, f = £, T^e} I £•/ I yield £\£@e\V@£ 
for x in £ do e | exists £ | if £ then e | run £ 

<@Q[C/,£)]@> 

Q ::= Q | op(Q, Q,e) | flm(a) -» Q \ lift Q \ Q e\VQ 
(Q,e)\(Q,Q)\{f = Q,f' = Q,f = e}\Q.f 
yield Q | Q ® e | V @ Q | for x in Q do e | for x in Q do Q 
exists Q | if Q then e | if Q then Q run Q 



Figure 2. Values and evaluation contexts 



2.3 Security Condition 

The goal of the type system is to enforce a notion of noninterfer- 
ence for functional language. Noninterference formalizes computa- 
tional independence between secrets and non-secrets, guaranteeing 
that no information about the former can be inferred from the latter. 
More precisely, this is expressed as the preservation of an equiva- 
lence relation under pairwise execution; given two inputs that are 
equal in the components that are visible to an attacker, evaluation 
should result in two output values that also coincide in the compo- 
nents that can be observed by the attacker. 

To that end this section introduces a notion of low-equivalence 
denoted by ~ that demands that parts of values with types that are 
annotated with L are equal, while placing no demands on the secret 
counterparts. More formally, we introduce a family of equivalence 
relations on values parametrized by types. 
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op(V ) 


— > o(op, V ) 




\\m\\x ) — > Jy ) V 


T 1\ [X i-^ V \ 




(TtC J (X) — t iV J V 


— >■ m [j i— > rec / ^ J — t 


AT „ , i T/1 


tSt { Vl , V2) 


> Vl 




snd ( Vi , k 2 J 


> V2 




if = Vj.fi 


— > Vi 




it true then M 


> M 




if false then A4 


v n 




for x in yield V" do A/ 


? iVj [X I — r V 1 




tor a; in [J do iv 






for a; in L @ M do AT 


— ► (for x in L do N) @ 


(for x in M do iV) 


exists [] 


— y false 




exists [V] 


— ► true, |F| > 0 




run Q 


— > eval(norm(Q)) 




liftc 


— ► <@ c @> 




«§ Q[(7. <® Q @> )] <§> 


—r «3 Q[Q] ®> 






M — > N 






£[M] —> £{N] 




Figure 3. Evaluation rules for host lan; 


*uage 


(fun(a;) - 


-¥ R)Q ~* R[xt-+ Q] 




{/ = 


= Q}-fi ~» Q« 




for x in yield 


Q do R ~* R[x i-> Q] 




for y in (for x in P do Q) do P — * for x in P do (for y in Q do P) 



for x in (if P then Q) do P ~~> if P then (for x in Q do P) 
for x in [] do N — * [] 
for a; in (P @ Q) do P 

(for a; in P do P) @ (for x in Q do P) 
if true then Q Q 
if false then Q ~-> 0 

Figure 4. Symbolic reduction phase 

for x in P do (Q @ P) ^ 

(for a; in P do Q) @ (for a- in P do P) 
for a: in P do [] s> [] 
if P then (Q @ P) ^ (if P then Q) @ (if P then P) 
if P then [] ^ [] 
if P then (if Q then P) ^ if P && Q then P 
if P then (for x in Q do P) for x in Q do (if P then P) 

Figure 5. Ad-hoc reduction phase 

Definition 1 (~t)- The family of equivalence relations ~< w rfe- 
inductively by the rules in figure^ 



S ::= [] | X | X @ X 

X ::= database(cft) | yield Y \ if Z then yield Y 
for x in database(dfe)./ do X 

Y ::= x | {/^Z} 

Z ::= c \ x.f | op(X) | exists 5 

Figure 6. Normalized terms 

I = L i = i' ^ = L^s = s' £ = L^b = b' 

i ~int* *' s ~string« s ' & ~bool* & ' 

Vui,«2,Wi,«i,Oi,n2-(fli ~E ^2 A Vl ~ t i; 2 A 
ei[a; >-> «i] — «i A e 2 [a: >-> W2] — >si 2 ^2) => 

Vl ~i' ^2 

fun(a;) — )■ ei fun(a:) — > e 2 

fil ~E ^2 A Vl ~t 1> 2 A 

ei[/ h-> rec /(a;) ->• ei,a; i-> vi] — ^1 "iA 
e 2 [f h-> rec /(*) e 2 ,a: h-> u a ] — ^n 2 v' 2 => 

^1 ~t' ^2 

rec f(x) ->■ ei ~ t ^. t ' rec /(a;) -> e 2 

«i ~t t «i « 2 ~t 2 w 2 v ~ t w 

(«i,u 2 ) ~ tl *t 2 (ui,« 2 ) {/ = «} ~ { 7tt } {/ = ^} 

^ = L^> (|[u]| = |[H7]| Ai)~ f tu) 

vn^na.nr ~ n 2 

evaln 1 (norm(ei)) ~< eval(i 2 (norm(e2)) 
ei ~Ex P r{t) e 2 



Figure 7. Introduction rules for ~ t 



When the type is evident from the context, we omit the subscript 
on ~, Moreover, we also write ~ for sequences of values. 

To present the relations in a more concise manner, we com- 
bine the cases for different security levels using implication in the 
premises; e.g. equality on base types is only required if the security 
level is L. 

Base types are compared using ordinary equality if the values 
are considered public. In the case of function types and quoted 
expressions, ~ ( corresponds to noninterference for the bodies of 
the functions. 

Records are related by ~ if they contain the same fields, and 
each field's contents are also related by ~. Two lists are required to 
have the same length if the list type is annotated with L, but their 
contents may differ based on the element type. 

To illustrate this, consider two lists of integers li = yield 1 @ [] 
and 1% = yield 2 @ []. If the lists are typed with the type t = 
(int H list) L , the length of the list is considered public, while the 
contents are confidential. If in contrast the type is t' = (int L Iist) L , 
neither the contents nor the length of the list is confidential. Hence 
h ~t h holds while Zi ~ t / / 2 does not. 

For simplicity, ~t is stated from the point of view of an ob- 
server on level L. ~t can be generalized for an arbitrary lattice by 
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parametrizing it with the level of the observer. Instead of checking 
if the level annotation on the type is equal to H, one then checks if 
it is higher than the level of the observer. 

Let O be a mapping from database names to database contents. 
We define low-equivalence for database mappings structurally in 
the following way. 



Definition 2 (~ s ). fii 
db it holds that fii (db) 



'E Sl2 holds if and only if for all databases 



With this we are ready to define the top-level notion of security, 
based on noninterference 128 1. Since the family of low-equivalence 
relations is parametrized by types the definition is done with respect 
to the initial database type and the final result type. 

Definition 3 (JV7(ei, e 2 )T.,t)- Two expression e± and e 2 are non- 
interfering with respect to the database type E and the exit type t 
if for all fli, £^2, vi andv2 such that Q,i ~e ^2, and ei — >q. Vi 
for i £ {1,2} it holds that 

vi ~ t v 2 

In particular for any given closed expression e, NI(e,e)-E,t 
should be read as e is secure with respect to the security policy 
expressed by E and t, i.e., no secret parts of the database as defined 
by E is able to influence the public parts of the returned value as 
defined by t. 

As common [1 38 . 48 1 in this setting, noninterference is 
termination-insensitive |41 52 1 in the sense that leaks via the ob- 
servation of (non)termination are ignored. 

2.4 Type System 

Figure [8] presents the typing rules for the host language. Typing 
judgments are of the form r h e : t where T is a typing context 
mapping variables to types, e is an expression, and t is a type. It 
denotes that expression e has type t in context T. £ U £' denotes the 
join oflevels/ and £', i.e., £u£' =HiffH g {£, £'}, and lU £' = L 
otherwise. 

Figure [9] presents the typing rules for the quoted language. 
Typing judgments in the quoted language have the form T; A h 
e : t, where T is the typing context for the host language and A is 
the typing context for the quoted language. 

Most types contain a level annotation £ that denotes whether or 
not the "structure" of the value is confidential. In the case of base 
types such int or string, this means that their values are confidential 
or not. In the case of (t list)*, the level £ indicates whether or not 
the length of the list is confidential. If £ — H, the entire list value is 
considered a secret, but if the £ — L, the length of the list may 
be disclosed to a public observer. However, the elements of the 
list may or may not be confidential depending on the level of the 
elements given by the type t. 

Record types, functions, and quoted expression types do not 
carry an explicit level annotation, since their security level is con- 
tained in sub-components of the type. 

In the case of records, it suffices to annotate the type of each 
field, since the structure of a record can not be modified dynam- 
ically. The confidentiality of a function is contained in the level 
annotation on the result type. The intuition is that, in the absence 
of side effects, the only way for a function to disclose information 
is via its result. For types for quoted expressions, i.e., types of the 
form Expr(t), the level annotation is already contained in t. 

We assume that types for operators, constants, and databases are 
given by the mapping E. Moreover, we also assume that each query 
only uses a single database. 

The typing rules for expressions in the host language and ex- 
pressions in the quoted language are nearly identical with a few 
exceptions: 



• Recursion is only allowed in the host language. 

• Quotations are only allowed in the host language. 

• Expressions of the form database^) are only allowed in the 
quoted language. 

• Antiquotations are only allowed in the quoted language. 



Const 
E(c) = t 

Thc:t e 
Fun 



Var 

x :t er 
T I- x : t 



Lift 



T\- e:t 



T h lift e : Expr(t) 



REC 

T, x : t, f : t -> t' h e : t' 
T h fun(x) ->• e : (i ->• i') fh rec f{x) -+e:t^t' 



r,i:ihe:(' 



Apply 



Op 



T h ei : t -> t rhe 2 :( E(op) =t^t The: 
rh ei e 2 :(' rhop(e):( U '- 



Pair 

r h ei : ti T h e 2 : t 2 
r I- (ei,e 2 ) :t 1 *t 2 



FST 

rhe:ti*t 2 
T h fst e : ti 



Snd 

T h e : h * t 2 



Record 



T h M : t 



T h snd e : t 2 T h {/ = M} : {/ : t} 

Project Yield Nil 

T\- L:{f :t} T h M : i 



T h L.fi :ti r h yield M : (t list)* rh[]:(t list)* 

Union 

T h M : (t list)* r I- N : (t list)*' 



r I- N @ M : (t list) 



lul' 



For 



r h M : (t list)* r,x:t\-N:(t' list)* 



T h for x in M do N : (t' list) 



lul 1 



Exists 



If 



T h M : (t list)* 



T h L : bool* T h M : (t list)* 



T h exists M : bool* r h if L then M : (t list)* u *' 



Run 

n-M: Expr(f) 
T h run M : t 



Quote 



IY h M : t 



T h <(3 M ®> : Expr(t) 



Sub 

£<£' T h M : i* 



T\-M :t 



l' 



Figure 8. Type system for host language 

When lists are constructed using yield and [] they can be as- 
signed an arbitrary level. Expressions of the form e\ @ e 2 reveal 
information about the structure of both lists and hence their security 
levels are combined in the result type. Similarly, exists only reveals 
information about the structure of the list, but nothing about the 
contents. Therefore, the security level of list contents is discarded 
and only the security level of the list itself is present in the result 
type. 
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ConstQ 
E(c) = f 

r-A\-c-.t e 



VARQ 

x : t £ A 
F:A\-x:t 



FUNQ 

T; A, x : t h e : t' 
F; A h fun(x) e : t ->• t 7 

ApplyQ 

r; A h ei : t -» t' T; A h e 2 : i 
r; A h ei e 2 : t' 



OpQ 

E(op) = *-»•* F; A \- M : t e 
F; A h qp(M) : t u<i 

PairQ FstQ 
r;Ahei:ti r; A h e 2 : ta r; A h e : t\ * ta 

V; A h fst e : ti 



r ; A h (ei,e 2 ) : ti * t a 
SndQ 

V; A h e : ti * ta 



RecordQ 

FTATKMTf 



r; A h snd e : ta 

ProjectQ 
r ; A h L : {/7t} 
r; A h L./i : ti 



NILQ 



r ; A h Q : (t list)* 



r;Ah{/ = M} :{/:«} 

YieldQ 

T;AI-M:( 

T; A h yield M : (t list)* 

EXISTSQ 

r; A I- Af : (t list)* 
r; A h exists M : bool* 



IFQ 

r ; A I- L : bool* r ; A I- M : (t list)* 



tut' 



T; A h if L then M : (t list) 

Union Q 

r; A h M : (tlist)* r; A I- N : (t list)* 



r; A I- N @ M : (t list)* u *' 

ForQ 

r; A h M : (t list)* r ; A, sc : t h AT : (t' list)*' 
T; A h for a; in M do AT : (t' list)* u *' 

vSEQ 

E(d&) = {/T7} 



SubQ DatabaseQ 
£ < £' T: A h M : t* 



F; A h M : t 



F; A h database(dfe) : {/ : t} 



Antiquote 
The: Expr(t) 

r ; a h a e ) : t 



Figure 9. Typing rules for quoted language 



Note that the rule QUOTE ensures that its arguments are typed in 
an empty context for quoted expressions. This expresses that only 
closed quoted terms are allowed in this language. Running a quoted 
expression e of type Expr(t) using run e results in a an expression 
of type t (rule Run). 

Expressions of the for database(dfe) get their type from the 
mapping E. The rule ANTIQUOTE allows to reference entities de- 



fined in the host language from within a quoted expression. The 
argument of an antiquotation must itself be a quoted expression. 

The rules SUB and SubQ allows raising the security level of an 
expression. £ < £' holds if and only if£ = LV£ = £' =H. 

To illustrate the type system further, we explain the typing rule 
FOR rule in greater detail. Recall that for expressions are used to 
denote list comprehensions. The typing rule assigns the resulting 
list the join of the security level of both sub-expressions. The 
following two examples demonstrate why this is required. 

Consider the following program that uses a for expression to 
leak the structure of the lists xs and ys. We assume xs to have type 
(t list)* for some type t and level £, whereas ys has type (t' list)* . 
for x in xs do ys 

Since the resulting lists for each element of xs will be concate- 
nated, the resulting list will have length \xs\ x \ys\ where o| de- 
notes the length of a. If either xs or ys contains only one element, 
the length of the other list is revealed through the result. To account 
for this information flow, the resulting list will be typed with level 
£U£'. 

2.5 Soundness Result 

As explained above, the soundness result is stated in terms of non- 
interference, i.e., as the preservation of a low-equivalence relation 
under pairwise execution. If we start out in any two low-equivalent 
environments then the result of running a well-typed program will 
be low-equivalent with respect to the type of the program. 

Assuming that the typing of the execution environment corre- 
sponds to the capabilities of the attacker, noninterference guaran- 
tees that all information readable by the attacker is independent 
of confidential information. To make the connection between the 
database policy E and the type system explicit we write E h e : t 
even though E was kept implicit in the type rules in Figures [8] and 
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Theorem 1 (Typing soundness). IfE h e : t, then NI(e, e)s,t- 

Proof of theoremU] Immediate from Lemma [T] by expanding the 
definition of NI since e is a closed term. □ 

Lemma 1 (Typing soundness (generalized)). If x : t h e : t, 

e[x i y vT] — v[, e[x h-> V2] — >^ 2 v' 2 , fii ~e ^ 2 and 
vT ~ V2 , then v[ ~t v' 2 - 

Proof. Mutual induction over the typing derivation F h e : t and 
analogous statement for the quoted language. The full proof can be 
found in the full version of the paper. □ 

3. Implementation 

Since F# contains an abundance of features not relevant to the cur- 
rent development we implement the language presented in Sec- 
tion[2] rather than attempting to enrich the F# implementation with 
security types. Our implementation compiles programs in this lan- 
guage to executable F# code. Given that the presented language is 
a subset of F#, the compilation consists mainly of removing level 
annotations in types in the program and establishing a connection 
to the database server. 

This allows reusing the F# infrastructure for language-integrated 
query, as well as the improvements to this mechanism [ 16 1. 

To simplify writing programs in the presented language, we im- 
plement a type inference algorithm supporting polymorphism for 
both levels and types. The basic approach that is used is based on 
constraint generation and unification 1231 . For efficiency reasons 
the implementation is based on equality constraints, even though 
full inference would require inequality constraints. Interpreting in- 
equality constraints as equality constraints introduces inaccuracies 
that prevents the types of some programs to be inferred properly. 
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However, since constraints are only generated in case they cannot 
be shown to be satisfied at the point of introduction it is always pos- 
sible to resolve any such inaccuracies by providing type informa- 
tion in the form of type annotations. In practice, the type inference 
allows us to leave out many type annotations as witnessed by the 
examples in this paper. 

The type-checker and compiler are implemented in Haskell, 
using the BNFC tool 1 3 1 for generating parsing and lexing code. 
The resulting binary takes a program in the language presented in 
Section |2~Tj and produces F# code as output if the program is well- 
typed. If the program is not well-typed an error message detailing 
the reason for the type-checking failure is produced. 

To illustrate the compilation, consider the output of the compiler 
for the example from Section |2.1| that queries the database for 
couples where the age difference between partners is greater than 
10 years. 

// import statements omitted 

let ConnectionString_PeopleDB = 

"Data Source= . \MyInstance ; Initial "\ 
"Catalog=PeopleDB; Integrated Security=SSPI" 

type dbSchema_PeopleDB = 

SqlDataConnection<ConnectionString_PeopleDB> 

let db_PeopleDB = dbSchema_PeopleDB . GetDataContext () 

let db = <@ db_PeopleDB <3> 

type ResultType = {name : string; diff : int; } 
let differences : Expr<ResultType IQueryable> = 
«3 query { for c in (7«db) . Couples do 
for pi in (°/ 0 db) . People do 
for p2 in (7.db) . People do 
if (c.Personl = pi. Id) && 
(c.Person2 = p2.Id) && 
(abs (pi. Age - p2.Age) > 10) then 
yield { name = pi. Name 

; diff = pi. Age - p2.Age > } 

<§> 

let main = PLinq. Query . qquery 

■C for x in (°/,dif f erences) do yield x } 
main 

The above code example first imports all necessary libraries as 
well as the implementation of the supplementary concepts 1161 . 
The subsequent part handles establishing a connection to the 
database server running on the same machine. The compiler gen- 
erates a separate connection to the server for each database that is 
used by the program. Type synonyms and function definitions are 
compiled in a straight-forward way. The main difference is that all 
security levels have been removed from any types in the program. 

For technical reasons, F# does not support query generation 
for quoted list expressions and therefore the compiler translates 
occurrences of the list type to IQueryable instead. Moreover, we 
translate expressions of the form run e into calls to a function 
testPLinqQ from the implementation accompanying |16|. This 
function takes a quoted expression, translates it into an SQL query, 
executes it and then returns the results. 

Since our approach is purely static, and all security type in- 
formation is erased during compilation, performance is unaffected, 
compared to ordinary F# code. Additionally, by reusing the results 
from Cheney et al. 1161 , we are able to benefit from the optimiza- 
tions to F#'s LINQ mechanism presented there. Cheney et al. in- 
clude a performance evaluation that is also valid for this imple- 
mentation. 

The code for the implementation is available online. The URL 
is given in Section[TJ 



4. Algebraic Data Types 

We extend the language presented so far with algebraic data types 
and information-flow control for them. 

This enriches the language with a way to express parametrized 
recursive data types that subsumes tuples and records. The addition 
is a proper extension to the language; neither tuples nor records 
can be recursive or parametrized in our language. We argue that 
introducing algebraic data types is a natural development due to 
their expressiveness and easy deconstruction via pattern matching. 
Encoding algebraic data types in an extended notion of records 
would both require extensions to the existing constructs that are 
similar to the extension needed to add algebraic data type and the 
result would be significantly less elegant. 

Algebraic data types allow for the definition of new data types 
by composing existing data types. An algebraic data type consists 
of one or more constructors that can contain another type as their 
argument, including recursive occurrences of the defined data type. 
Pattern-matching is used deconstruct values in an algebraic data 
type by matching against the different constructors and parameters. 
The data contained in the parameters of a value in the data type can 
be extracted by giving a variable in the pattern. 

Syntax Without loss of generality, consider an algebraic data type 
T with type argument a, which can be a product of several type 
variables. Constructors C\, . . . ,C'i have the form d of ti where ti 
is the argument of the constructor. Constructors with no arguments 
can be considered to take a value of unit type as an argument. For 
clarity, we only match on the outermost constructor of a single 
expression at a time. To track information flow, a security level 
annotation is then added to the type T. The expressions and values 
are extended as follows: 

e ::= . . . C» e | match e with Ci x\ — > e\ ; ... ; Ck Xk — > ejfe 

£ ::= d £ \ . . . \C k £ 

match £ with Ci x — > e; ... ; Ck x — > e 

Q ::= Ci Q | . . . | Q Q 

match Q with Ci x — > e; ... ; Ck x — > e 

V::= ... \ dV 

Semantics The semantics is extended with the following rules for 
evaluation of constructors and pattern matching. 

(match d v with 

| Ci xi -» ei 
I ■■• 

| C k x k -> e fc ) — > e i [x l h-> v] 

These rules correspond to the usual semantics of algebraic data 
types in other functional languages. Constructors with values as ar- 
guments are themselves values and cannot be evaluated further. If a 
constructor argument is not a value, it is evaluated, match expres- 
sions evaluate the expression that is being matched on first, and 
then evaluate the appropriate branch while binding the argument to 
the constructor to a name. 

Type system To support algebraic data types in the type system, 
we use two rule schemas which generate several typing rules for 
each algebraic data type in the program. For each algebraic data 
type with I constructors, one rule for match expressions is added 
and I typing rules for the constructors. 

The rule schema for constructors takes into account that type 
arguments to constructors might contain the type that is being 
defined. In that case their level annotations need to be combined 
to keep the structure of the value confidential. T e S ti holds for all 
components of ti of the for a T . 
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In the case of match expressions, the structure of the algebraic 
data type is used to decide which branch to evaluate. To track this 
flow of information, the type of the branches needs to be upgraded 
to the level annotation of the algebraic data type. For this, we define 
an upgrade function upg(t, €) which denotes upgrading the type t 
to have at least level I in its outermost components. 

Definition 4 (Upgrade function). upg(t, £) is defined by recursion 
on the structure oft. 



upgiint 1 , £') 


. Jul' 
— int 


upgiboot , t) 


= bool tui 


upg(string e ,£') 


. ■ lui' 
— string 


upg(t^t',e') 


= t^upg{t',l') 


upg(t! * t 2 ,l') 


= upg(ti,£') * upg(t2,(. ! 


upg({JTt},e') 


= {f :upg(t,l>)} 


upg((tlist) ,£') 


= (tlist) lul ' 


upg(Expr{t),£') 


= Expr(upg{t,l')) 


upg((a T)V) 


= (a T) eue ' 



CONSTR 

e : U 

n-C,e: T Ut ^* 

Match 

F\- e: (aT) e VI < i < IT, Xi : U h e» : t 
T h (match e with | Ci x\ — ¥ ei | ... | Ck Xk — > e*) : upg(t, £) 

In the rule CONSTR, |_l T £ gt . I denotes the join of all levels 
on occurrences of T in the type ti. This ensures that the level 
annotation on the resulting value is not lower than its components. 
For instance, the constructor rule for the node constructor of a 
binary tree type will require the structure of the constructed tree 
to be at least as confidential as the structure of the two sub-trees. 

To be able to extend the soundness result for the type system 
to algebraic data types, the family of equivalence relations ~ also 
needs to be extended for each algebraic data type. In doing so, we 
follow the intuition given for ~ in the base language. The level 
annotation I on (a T) corresponds to the confidentiality of the 
structure of the type, i.e. which constructor a value consists of. If i 
is high, we consider the entire value, including components, to be 
confidential. 

It should be pointed out that the rule schemas assume that the 
defined algebraic data types are well-formed, i.e., 

• recursive occurrences of the defined type must have the same 
type argument a, and 

• the only type variables that can occur in arguments to construc- 
tors must be type variables in a. 

Soundness The low-equivalence relation is extended to the val- 
ues of algebraic data types. As for the built-in list data type, if 
I = L, arguments to constructors may or may not be confidential, 
depending on their level annotations. 

I = L => (i = j A Vi ~tj V2) 
Ci vi ~ a T e Cj V2 

We prove the same soundness theorem as for the base language 
in this extended setting. 

Theorem 2. If\- e : t, fli ~e O2, e — >-q. vi and e — >q 2 vi, 
then vi ~t V2- 



Proof. Extension of proof for Lemma ^ for the new typing rules 
that are induced by algebraic data types. □ 

Note that while the theorem statement is the same, the set of 
types and expressions is now potentially larger, since it is extended 
in accordance with the algebraic data types defined in e. 

Example: lists One common use for algebraic data types is to 
define recursive structures such as list. To demonstrate that our 
extension is capable of supporting such use cases, consider the 
following user-defined list data type: 

type ' a MyList = 
I Nil 

I Cons of ('a, 'a MyList) 

Instantiating the above rule schemas for the user-defined list 
type MyList yields the following three type rules; two for the 
constructors, and one for the matching. 

r h ei : 'a Y h e 2 : 'a MyList 4 ' 
T h Nil : 'a MyList* V h Cons (ei, e 2 ) : 'a MyList* 

The: 'a MyList* 
T h ei : t T, x : ('a, 'a MyList*) he 2 :( 
T h match e with | Nil — > ei | Cons x — > 62 : upg(t, £) 

The generated rules match the intuitions given for the rest of 
the type system. Since match expressions information about the 
results of the branches (which have level £') as well as the structure 
of the list (i.e. level £) that the expression matches on, the level 
of the resulting list is £ U £' . Moreover, the type system allows 
us to define corresponding functions for the yield, exists, @, and 
for constructs that are built into the language. The inferred type of 
each definition is given as a comment. Since the implementation 
sometimes generates extraneous type variables in inferred types 
that have no effect on generality, we give slightly simplified but 
equivalent types here. 

// t ->• (t MyList)* 
let yield' = 

fun x -> Cons (x, Nil) 

// (t MyList)* -> bool* 

let exists' = fun xs -> match xs with 

I Nil -> False 

I Cons xs' -> True 

// (t MyList)* -> (t MyList)* -> (t MyList)* 
let rec union' = fun xs -> fun ys -> match xs with 
I Nil -> ys 

I Cons xs' -> Cons (fst xs ' , union' (snd xs') ys) 

// (ti MyList)* 1 -> (h -> (i a MyList)* 1 u * 2 ) 
// -* (ft, MyList)* 1 u * 2 
let rec for' = 

fun xs -> fun f -> match xs with 

I Nil -> Nil 

I Cons xs' -> union' (f (fst xs')) 

(for' (snd xs ' ) f) 

Note that the types of these functions correspond roughly to the 
typing rules given for the built-in constructs. However, in the case 
of union' and for', the type is slightly more restrictive than the 
typing rule, due to the way recursion is type-checked. However, 
these restrictions only affect the type of arguments and may only 
require lifting an argument expression to a higher security level. 
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Example: trees To further illustrate algebraic data types in the 
context of information flow, we discuss another common use, 
namely tree structures. We define an algebraic data type for binary 
trees: 

type ' a BinTree = 
I Leaf 

I Node of (('a BinTree * 'a) * 'a BinTree) 

In the same manner as for the user-defined list type, this will 
result in one rule for match expressions and two rules for the 
constructors: 



T h Leaf : ('a BinTree) 



The: ((('a BinTree 



('aBinTreer 2 ) 



T h Node e : ('a BinTree 



r h e : ('a BinTree)* 1 u<2 F h ei : t 
T,x : ((('a BinTree)* 1 * 'a) * ('a BinTree)* 2 ) he 2 :( 
T h match e with | Leaf — > ei | Node x — > ei : upg(t, t\ U I2) 

The two typing rules for the constructors ensure that confiden- 
tiality of the tree structure is propagated correctly from the subtrees 
that are passed to the Node constructor. This construction is analo- 
gous to typing rules for lists in that the structure of the tree might 
be public while the tree elements might be confidential. 

To illustrate the last point, consider a tree where the structure of 
the tree is not confidential while its elements are secrets: 

let privTree : (int~H BinTree) "L = 
Node ((Leaf, (5 : int~H)), 

Node ((Leaf, (6 : infH)), Leaf)) 

Since only the content at the leaves is considered private, count- 
ing the number of leaves of this tree can be typed with L: 

let rec countLeaves = 
fun t -> match t with 
I Leaf -> 1 

I Node x -> countLeaves (fst (fst x) ) + 
1 + 

countLeaves (snd x) 

let result : int~L = countLeaves privTree 

In contrast, trying to add all the integers in this tree and an- 
notating the result with a low type will not type-check, since the 
computation involves more than merely the structure of the tree: 

let rec sumElements = 
fun t -> match t with 
I Leaf -> 1 

I Node x -> sumElements (fst (fst x) ) + 
snd (fst x) + 
sumElements (snd x) 

// this is not well-typed: 

let result' : int~L = sumElements privTree 

5. Case Study: Movie Rental Database 

In this section we exemplify the type system on a realistic ex- 
ample, a database to keep track of customer records by a movie 
rental chain, depicted in FigurefTO| The example data and database 
schema 1 10 1 are courtesy of postgresqltutorial . com with per- 
mission to use their sample database in this work. The database 
contains information about approx. 16000 rentals, 600 customers, 



and 1000 movies. We use an existing sample database to demon- 
strate that our technique is applicable for database schemas that 
were not designed with information flow security in mind. 

We first introduce a security policy for the database and consider 
various interesting queries that can be performed. Using the same 
setting, we illustrate the use of algebraic data types. 

5.1 Basic Queries 

The database keeps track of various information related to the 
movie rentals. Each rental is associated with a film, a customer, 
and a payment. The payments contain payment information and 
identifies the staff and the customer involved in the transaction. For 
both staff and customers address information is stored. 

A reasonable security policy for such a database is to consider 
the names and exact addresses of customers and staff as confiden- 
tial, while the rest of the data is considered public. In particular, the 
city of customers and the payment information are not considered 
confidential. The former is not a problem unless the city uniquely 
identifies a person and the latter does not contain any sensitive in- 
formation. This security policy allows for querying the database for 
various interesting statistical information without disclosing confi- 
dential information about the customers. 

Consider, for instance, the following example, which collects all 
rental ids for a given city. 

let db = «3 database "Rentals" @> 

let findCityld = 

<@ fun city -> for c in (°/,db).City do 
if c.Cityl = city then 
yield c.City_id @> 

let cityRentals : Expr<string~L -> int'L list~L> = 
<<3 fun city -> for cid in (°/ 0 f indCityld) city do 
for r in (°/«db) .Rental do 
for cu in (°/.db) .Customer do 
for a in (°/,db) . Address do 
if a.City_id = cid kk 

cu. Address_id = a.Address_id kk 
r . Customer_id = cu.Customer_id 
then yield r.Rental_id <3> 

First in the example is the function findCityld that collects 
the city ids for a city of a given name. This function is used in 
cityRentals to look for rentals by customers living in that city. 
Note that while customer data is used, the type system ensures that 
only non-sensitive data affects the computation of the result. The 
rental ids can easily be used to produce interesting statistics about 
the relative popularity of films for different cities. 

In contrast, trying to find all customers who rented a particular 
movie while forces the result to be secret, since the names of the 
customers are confidential. Thus, the following program is rejected 
by the type checker: 

let rentalsForMovieTitle = 
<@ fun title -> for f in O/.db) .Film do 

for r in (°/,db) .Rental do 
for i in (°/«db) . Inventory do 
if f. Title = title kk 

r . Inventory_id = i . Inventory_id 
i.Film_id = f.Film_id 
then yield r @> 

let customersWhoRented 

: Expr< string~L -> string"L list~L > = 
<@ fun title -> 

for r in (°/ 0 rentalsForMovieTitle) title do 
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<§ actorjd 

firsLname varchai 
lastjiame 
lastjjpdate: timesta 



categoryjd InM 
name: varchar{25} 
lastjjpdate: timesta.. 




fllm_id: int2 
categoryjd: Int2 
lastjjpdate: tlmesl 



actorjd 
(jp filmjd: 'ii? 
lastjjpdate 



language 



t|J language jd int4 
name: ctiar(20) 
lastjjpdate ' e 



f filmjd: int4 
tide; varchar(255) 
description: tent 
nelease_year: "public". "year' 
languacjejd intZ 
rentaLdu radon 
rental j-ate 
length: int2 

replacement_CQSt numeric 
rating: "public". "mpaa_rati.. 
lastjupdate: tlmestamp 
spedaUeatures: text 
fulltext tevector 



(Jl storejd 

manager.staffjd: nt2 
addressjd Int2 
last_update: timesta... 



inventory Jd InM 
filmjd: intZ 
storejd 
lastjjpdate 



y rental Jd; ■ 

rental_date ■nestamp 
inventoryjd nt4 
customer jd: int2 
return date e'jla... 
stafYjd: intZ 
lastjjpdate: timesta... 




paymentjd: InM 
customer Jd: IntZ 

staffjd: IntZ 
rentaljd: int4 
amount: numeric 
paymentjdate. timesta.. 



5>e- 



staffjd: int4 
firstjname: varchar( 
lastjnarne: varchar{45) 
addressjd 
email: varoha 
storejd: intZ 
active: bool 

usernarne: varchar{16) 
password; varchar(40) 
lastjjpdate ■ 1,1 .. 

picture bytea 



customerjd: int4 

storejd 

first_name 

lastjnarne; varchar(45} 
email: varchar(5D) 
addressjd inti 
activebool 
createjdate: date 
lastjjpdate tmesta 
active: inM 



address 



addressjd 
address; varchar(50) 
addressZ: varcbar(SO) 
district: varchar(2C 
cityj'd; intZ 

postal_code: varcharflOJf+L 
phone: varchar{20) 
lastjjpdate: timestanip 




city: varcnar(50) 
countryjd 

lastjjpdate tmesta., 



country 



$ countryj'd 

country: varchar{50) 
lastjjpdate I mesta.. 



Figure 10. E-R diagram of movie rental database (image courtesy of postgresqltutorial . com) 



for c in (°/„db) .Customer do 

if c . Customer_id = r . Customer_id 

then yield c.Last_name @> 

The reason for the (correct) type error is that first and last 
names of customers are typed as string" while the function 
customersWhoRented attempts to return a list containing ele- 
ments of type string L . Changing the security annotation to reflect 
this makes the type system accept the program. 

More complicated queries can be handled with the same ease as 
the simpler above examples. Consider, for instance, the following 
query that finds all movies that were rented at least twice by the 
same customer: 

let moviesRentedTwice : Expr< int'L list"L > = 
<@ for rl in C/„db) . Rental do 

for r2 in (°/.db) .Rental do 

for i in (°/,db) . Inventory do 

for f in (y.db).Film do 

for c in (°/»db) . Customer do 

if not (rl.Rental_id = r2 . Rental_id) kk 
rl . Inventory_id = i . Inventory_id kk 
r2 . Inventory_id = i . Inventory_id kk 
i.Film_id = f .Film_id kk 



rl . Customer_id = c . Customer_id kk 
r2 . Customer_id = c . Customer_id 
then yield f .Film_id @> 

Thus, the above examples illustrate the power of the method 
clearly. By giving a security policy for the contents of the database 
we are able to track information flow in advanced queries in term 
of the information flow of the quoted language. Not only does this 
allow us to establish security information flow in programs that 
interact with databases, it does so in a way that is intuitively simple 
to understand; an additional benefit of expressing the database 
interaction in a homogeneous way is that it makes the information 
flow in the interaction more immediate. 

5.2 Algebraic Data Types 

To demonstrate the usefulness of information-flow tracking for al- 
gebraic data types discussed in Section [4] in a more practical set- 
ting, we will now consider an example demonstrating information- 
flow tracking from values stored in the database in conjunction with 
user-defined algebraic data types. 

One plausible scenario for the use of such a database is to 
aggregate some information about the customer in order to make 
predictions about which movies he would be interested in. For 
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instance, one might want to determine a user's favorite category 
along with the movies he watched in that category. To that end, 
we can introduce the following algebraic data type that encodes 
a category along with a list of movie ids. (We only consider two 
categories for simplicity.): 

type Category = 

I Action of (int"L list'L) 
I Scifi of (int"L list'L) ;; 

Moreover, this information could be part of a larger record 
that stores information about a customer, where some information 
might be confidential and should not be used when the program 
output can be observed by the attacker. As an example, we consider 
a program that produces records of the following form: 

type Userlnfo = 
{ uid : int"L 
; firstName : string'H 
; lastName : string'H 
; f avoriteCategory : Category"!, 

} ;; 

With this type, information about the favorite movie genre of 
a user can be used for prediction purposes, while the actual name 
of the customer cannot be retrieved without the resulting program 
being typed as H. 

The following code produces a list of categories along with 
movie ids that a user, identified by their id, has rented: 

let f ilmsByCustomer = 
<@ fun uid -> 

for r in (7 0 db) . Rental do 
for i in (°/„db) . Inventory do 
for f in (°/ 0 db).Film do 
if r .Customer_id = uid kk 

r . Inventory_id = i . Inventory_id kk 
i.Film_id = f.Film_id 
then yield f.Film_id <3> ;; 

let f ilmCategories = 
<@ fun fid -> 

for c in (°/„db) . Category do 

for cf in C/,db) . Film_category do 

if cf .Film_id = fid kk 

cf . Category_id = c . Category_id 
then yield c.Name @> ;; 

let userMovielnf o = 
<@ fun uid -> 

for fid in (°/„f ilmsByCustomer) uid do 
for cname in (°/,f ilmCategories) fid do 
yield { catname = cname ; fid = fid } @> ; ; 

Since the LINQ framework in F# does not allow producing 
values of user-defined algebraic data types from within a query, 
we first need to produce a record that contains the list of categories 
and movie ids returned by userMovielnf o. 

let compileStats : Expr< Userlnfo list"L > = 
<@ for cust in (7„db) . Customer do 
yield { uid = cust . Customer_id 

; firstName = cust . First_name 
; lastName = cust . Last_name 
; movieCategories = 

(°/»userMovieInf o) cust . Customer_id } @> 

To turn the information in the movieCategories field into 
an element of the defined algebraic data type, the following code 
counts the given list of movie data and then produces a value of type 
Category depending on which category occurs more often. (This 



is intentionally not written in a functional style to avoid having to 
introduce many additional auxiliary functions commonly found in 
functional languages.) The code then constructs a new record with 
the movieCategories replaced by the users favorite category. 

Note that this computation now takes place in the host language, 
and security levels from the query result are propagated to these 
functions. 

let updateCount = 

fun minfo -> fun statsrec -> 
{ actionMovies = 

if (minfo . catname = "action") 

(yield minfo. fid) [] @ 
statsrec . actionMovies 
; scifiMovies = 

if' (minfo . catname = "scifi") 

(yield minfo. fid) [] 0 
statsrec . scifiMovies } 

let emptyCounts = { actionMovies = [] ; scifiMovies = [] } 

let countCategories = 

fun catList -> fold catList updateCount emptyCounts 

let f avoriteUserCategory = 
fun minfos -> 

let statsrec = countCategories minfos 
in (if (length statsrec . actionMovies > 
length statsrec . scifiMovies) 

(Action (statsrec . actionMovies) ) 

(Scifi (statsrec . scifiMovies) ) ) 

let stats = map (fun x -> 
{ uid = x.uid 
; firstName = x. firstName 
; lastName = x. lastName 
; f avoriteCategory = 

f avoriteUserCategory (x. movieCategories) }) 
(run compileStats) 

let getCategories : Category'L list~L = 
map (fun x -> f avoriteUserCategory 

(x .movieCategories) ) 
(run compileStats) 

The type system then correctly infers that the computation of the 
category does in fact not depend on confidential information about 
the user, while the name and email fields of the resulting records do. 
if ' works like the built-in if construct except that it can produce 
values that are not lists and also requires an expression for the else 
case. 

Moreover, attempting to find the favorite category of one partic- 
ular user, identified by name, and typing the result with L will be 
prevented by the type checker. Concretely, an example such as the 
following, will be rejected by the type checker: 

let attack : Category'L list~L = 
for x in getCategories do 

if x. firstName = "John" kk x. lastName = "Doe" 
then yield x . f avoriteCategory 

6. Related Work 

Until recently, little work has been done on bridging information- 
flow controls for applications 1 13. 29 30. 41] and databases they 
manipulate. While mainstream database management systems such 
as PostgreSQL Q, SQLSever (8), and MySQL [6| include pro- 
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tection mechanisms at the level of table and columns, as is, these 
mechanisms are decoupled from applications. 

Below, we focus on the work that shares our motivation of in- 
tegrating the security mechanisms of the application and database, 
with the goal of tracking information flow. 

WebSSARI by Huang et al. |32| is a tool that combines static 
analysis with instrumented runtime checks. The focus is on PHP 
applications that interact with an SQL database. The system suc- 
ceeds at discovering a number vulnerabilities in PHP applications. 
Given its complexity, its soundness is only considered informally. 

Li and Zdancewic [351 present an imperative security-typed 
language suitable for web scripting and a general architecture that 
includes a data storage, access control, and presentation layers. The 
focus is on suitable labels for confidentiality and integrity policies 
as well as the possibilities of safe label downgrading [44 1. No 
soundness results for the type system are reported. 

A line of work has originated from, or influenced by, from 
Links by Cooper et al. 1211 . a strongly-typed multi-tier functional 
language for the web. Links supports higher-order queries. On the 
other hand, Links comes with a non-standard database backend, 
making its interoperability non-trivial. 

DIFCA-J by Yoshihama et al. 1531 is an architecture for dy- 
namic information-flow tracking in Java. The architecture covers 
database queries as performed by Java programs via Java DataBase 
Connectivity (JDBC) APIs. 

Baltopoulos and Gordon [12] study secure compilation by aug- 
menting the Links compiler with encryption and authentication of 
data stored on the client. Source-level reasoning is formalized by 
a type-and-effect system for a concurrent A-calculus. Refinement 
types are used to guarantee that integrity properties of source code 
are preserved by compilation. 

SELinks by Corcoran et al. [22] also builds on Links. With the 
Fable type system by Swamy et al. |49| at the core, the authors 
study the propagation of labels, as described by user-defined func- 
tions, through database queries. Fable's flexibility accommodates 
a variety of policies, including dynamic information-flow control, 
provenance, and general safety policies based on security automata. 

DBTaint by Davis and Chen |24| shows how to enhance 
database data types with one-bit taint information and instantiate 
with two example languages in the web context: Perl and Java. 

Chlipala's UrFlow |18| offers a static information-flow analy- 
sis as part of the Ur/Web domain-specific language for the devel- 
opment of web applications. Policies can be defined in terms of 
SQL queries. User-dependent policies are expressed in terms of the 
users' runtime knowledge. 

Caires et al. 1151 are interested in type-based access control 
in data-centric systems. They apply refinement types to express 
permission-based security, including cases when policies dynami- 
cally depend on the state of the database. This line of work leads to 
information-flow analysis by Lourenco and Caires 1 37 1. This analy- 
sis is presented as a type system with value-indexed security labels 
for A-calculus with data manipulation primitives. The type system 
is shown to enforce noninterference. 

Hails by Giffin et al. |27| is a web framework for building 
web applications with mandatory access control. Hails supports a 
number of independently such useful design pattern as privilege 
separation, trustworthy user input, partial Lourenco and Caires 1371 
update, delete, and privilege delegation. 

IFDB by Schultz and Liskov [46 1 proposes a database manage- 
ment system with decentralized information-flow control. IFDB is 
implemented by modifying PostgreSQL as well as modifying ap- 
plication environments in PHP and Python. The underlying model 
is the Query by Label model that provides abstractions for manag- 
ing information flows in a relational database. This powerful model 



includes confidentiality and integrity labels, and models decentral- 
ization and declassification. 

LabelFlow by Chinis et al. [ 17 1 dynamically tracks information 
flow in PHP. It is designed to deal with legacy applications, and so it 
transparently extends the underlying database schema to associate 
information-flow labels with every row. 

The SLam calculus by Heintze and Riecke |31| pioneers 
information-flow control in a functional setting. The security type 
system treats a simple language with first-class functions, based on 
the A-calculus. This is the first illustration of how noninterference 
can be enforced in the functional setting. Our security type system 
adopts as the starting point the security type system by Pottier and 
Simonet |40|, which they have developed for a core of ML, and 
which serves as the base for the Flow Caml tool [48 1. Compared 
to that work, our system includes the formalization and implemen- 
tation of algebraic data types and pattern matching. Experiments 
with Flow Caml indicate support for algebraic data types but with- 
out evidence of soundness 1401 . 

The tools like SIF QD, SWIFT (20), and Fabric HO allow the 
programmer to enforce powerful policies for confidentiality and in- 
tegrity in web applications. The programmer labels data resources 
in the source program with fine-grained policies using Jif 1381 , an 
extension of Java with security types. The source program is com- 
piled against these policies into a web application where the poli- 
cies are tracked by a combination of compile-time and run-time 
enforcement. The ability to enforce fine-grained policies is an at- 
tractive feature. At the same time, SIF and SWIFT do not provide 
database support. Fabric supports persistent storage while leaving 
interoperability with databases for future work. 

A final note on related work is that care has to be taken when 
setting security policies for sensitive databases. Narayanan and 
Shmatikov's widely publicized work [39] demonstrates how to 
de-anonymize data from Netflix' database (where names were 
"anonymized" by replacing them with random numbers) using 
publicly available external information from sources as the Internet 
Movie Database [9|. 

7. Conclusion 

We have presented a uniform security framework for information- 
flow control in a functional language with language-integrated 
queries (with Microsoft's LINQ on the backend). Because both the 
host language and the embedded query languages are both func- 
tional F#-like languages, we are able leverage information-flow 
enforcement for functional languages to obtain information-flow 
control for databases "for free", synergize it with information- 
flow control for applications, and thus guarantee security across 
application-database boundaries . We have developed a security 
type system with a novel treatment of algebraic data types and 
pattern matching, and established its soundness. We have imple- 
mented the framework and demonstrated its usefulness in a case 
study with a realistic movie rental database. 

A natural direction for future work includes support of declas- 
sification 1441 policies. This will enable more fine-grained labels 
and richer scenarios with intended information release. The func- 
tional setting allows for particularly smooth integration of policies 
of what [34 35 1431 is released, where we can express aggregates 
through escape hatches 1421 . as represented by functions with no 
side effects. We believe that enriching the model with these poli- 
cies will also open up for direct connections to the database infer- 
ence 1 26 1 problem, much studied in the area of databases. 
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